Reward-Rate Maximization in Sequential Identification under a Stochastic Deadline

نویسندگان

  • Savas Dayanik
  • Angela J. Yu
چکیده

Abstract. Any intelligent system performing evidence-based decision making under time pressure must negotiate a speed-accuracy trade-off. In computer science and engineering, this is typically modeled as minimizing a Bayes-risk functional that is a linear combination of expected decision delay and expected terminal decision loss. In neuroscience and psychology, however, it is often modeled as maximizing the long-term reward rate, or the ratio of expected terminal reward and expected decision delay. The two approaches have opposing advantages and disadvantages. While Bayes-risk minimization can be solved with powerful dynamic programming techniques unlike reward-rate maximization, it also requires the explicit specification of the relative costs of decision delay and error, which is obviated by reward-rate maximization. Here, we demonstrate that, for a large class of sequential multihypothesis identification problems under a stochastic deadline, the reward-rate maximization is equivalent to a special case of Bayes-risk minimization, in which the optimal policy that attains the minimal risk when the unit sampling cost is exactly the maximal reward rate is also the policy that attains maximal reward rate. We show that the maximum reward rate is the unique unit sampling cost for which the expected total observation cost and expected terminal reward break even under every Bayes-risk optimal decision rule. This interplay between reward-rate maximization and Bayesrisk minimization formulations allows us to show that maximum reward rate is always attained. We can compute the policy that maximizes reward rate by solving an inverse Bayes-risk minimization problem, whereby we know the Bayes risk of the optimal policy and need to find the associated unit sampling cost parameter. Leveraging this equivalence, we derive an iterative dynamic programming procedure for solving the reward-rate maximization problem exponentially fast, thus incorporating the advantages of both the reward-rate maximization and Bayes-risk minimization formulations. As an illustration, we will apply the procedure to a two-hypothesis identification example.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multiprocessor System with Non-Preemptive Earliest-Deadline-First Scheduling Policy: A Performability Study

This paper introduces an analytical method for approximating the performability of a firm realtime system modeled by a multi-server queue. The service discipline in the queue is earliestdeadline- first (EDF), which is an optimal scheduling algorithm. Real-time jobs with exponentially distributed relative deadlines arrive according to a Poisson process. All jobs have deadlines until the end of s...

متن کامل

Sequential Hypothesis Testing under Stochastic Deadlines

Most models of decision-making in neuroscience assume an infinite horizon, which yields an optimal solution that integrates evidence up to a fixed decision threshold; however, under most experimental as well as naturalistic behavioral settings, the decision has to be made before some finite deadline, which is often experienced as a stochastic quantity, either due to variable external constraint...

متن کامل

Finite-time Analysis for the Knowledge-Gradient Policy

We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective of interpreting Bayesian ranking and selection problems as adaptive stochastic multi-set maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In a...

متن کامل

Optimal Sequential Detection with Compressive Sensing

We propose a new framework that combines stochastic optimization and compressive sensing tools to recover the support of a portion of an unknown vector and acquire a utility which decreases linearly with the number of measurement, and increases with the reward associated to a given state of each component. We model a utility function to strive the trade-off between number of measurements and nu...

متن کامل

Energy Scheduling in Power Market under Stochastic Dependence Structure

Since the emergence of power market, the target of power generating utilities has mainly switched from cost minimization to revenue maximization. They dispatch their power energy generation units in the uncertain environment of power market. As a result, multi-stage stochastic programming has been applied widely by many power generating agents as a suitable tool for dealing with self-scheduling...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Control and Optimization

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2013